Explore Loan Data by Lanmixue Mao (Michelle)


The dataset is comprised of 81 variables and contains 113937 entries. The variable that are explored in the dataset are the following Term : Amount of month customers opted for loan

LoanStatus : Current status of the loan like chargedoff, completed, defauted etc…

EstimatedEffectiveYield : Yield of lenders from borrowers minus the processing fee and late fines

ProsperScore : Risk Factor score from 1 to 10. 10 being least risky

BorrowerAPR : The Borrower’s Annual Percentage Rate (APR) for the loan.

BorrowerRate : The Borrower’s interest rate for this loan.

ListingCategory..numeric. : Prosper rating for borrowers in numbers

EmploymentStatus : Current type of employment

Occupation : Occupation of borrower at the time of listing

EmploymentStatusDuration : How long the employee has been employed

IsBorrowerHomeowner : Does the borrower owns house at the time of listing (True & False)

ProsperRating..Alpha. : Prosper rating for borrowers in alphabets

IncomeVerifiable : If the income of the borrower is verifiable at the time of listing (True & False)

StatedMonthlyIncome : Monthly income of the borrower

MonthlyLoanPayment : Monthly loan payment amount

Recommendations : Recommendations the borrowers has at the time of listing

DebtToIncomeRatio : The debt to income ratio of the borrower at the time the credit profile was pulled.

LoanOriginalAmount : Original amount of the loan

LoanOriginationQuarter : Quarter of the month when loan was originated

A basic exploration of the datset would give the following information


##                    ListingKey     ListingNumber    
##  17A93590655669644DB4C06:     6   Min.   :      4  
##  349D3587495831350F0F648:     4   1st Qu.: 400919  
##  47C1359638497431975670B:     4   Median : 600554  
##  8474358854651984137201C:     4   Mean   : 627886  
##  DE8535960513435199406CE:     4   3rd Qu.: 892634  
##  04C13599434217079754AEE:     3   Max.   :1255725  
##  (Other)                :113912                    
##                     ListingCreationDate  CreditGrade         Term      
##  2013-10-02 17:20:16.550000000:     6          :84984   Min.   :12.00  
##  2013-08-28 20:31:41.107000000:     4   C      : 5649   1st Qu.:36.00  
##  2013-09-08 09:27:44.853000000:     4   D      : 5153   Median :36.00  
##  2013-12-06 05:43:13.830000000:     4   B      : 4389   Mean   :40.83  
##  2013-12-06 11:44:58.283000000:     4   AA     : 3509   3rd Qu.:36.00  
##  2013-08-21 07:25:22.360000000:     3   HR     : 3508   Max.   :60.00  
##  (Other)                      :113912   (Other): 6745                  
##                  LoanStatus                  ClosedDate   
##  Current              :56576                      :58848  
##  Completed            :38074   2014-03-04 00:00:00:  105  
##  Chargedoff           :11992   2014-02-19 00:00:00:  100  
##  Defaulted            : 5018   2014-02-11 00:00:00:   92  
##  Past Due (1-15 days) :  806   2012-10-30 00:00:00:   81  
##  Past Due (31-60 days):  363   2013-02-26 00:00:00:   78  
##  (Other)              : 1108   (Other)            :54633  
##   BorrowerAPR       BorrowerRate     LenderYield     
##  Min.   :0.00653   Min.   :0.0000   Min.   :-0.0100  
##  1st Qu.:0.15629   1st Qu.:0.1340   1st Qu.: 0.1242  
##  Median :0.20976   Median :0.1840   Median : 0.1730  
##  Mean   :0.21883   Mean   :0.1928   Mean   : 0.1827  
##  3rd Qu.:0.28381   3rd Qu.:0.2500   3rd Qu.: 0.2400  
##  Max.   :0.51229   Max.   :0.4975   Max.   : 0.4925  
##  NA's   :25                                          
##  EstimatedEffectiveYield EstimatedLoss   EstimatedReturn 
##  Min.   :-0.183          Min.   :0.005   Min.   :-0.183  
##  1st Qu.: 0.116          1st Qu.:0.042   1st Qu.: 0.074  
##  Median : 0.162          Median :0.072   Median : 0.092  
##  Mean   : 0.169          Mean   :0.080   Mean   : 0.096  
##  3rd Qu.: 0.224          3rd Qu.:0.112   3rd Qu.: 0.117  
##  Max.   : 0.320          Max.   :0.366   Max.   : 0.284  
##  NA's   :29084           NA's   :29084   NA's   :29084   
##  ProsperRating..numeric. ProsperRating..Alpha.  ProsperScore  
##  Min.   :1.000                  :29084         Min.   : 1.00  
##  1st Qu.:3.000           C      :18345         1st Qu.: 4.00  
##  Median :4.000           B      :15581         Median : 6.00  
##  Mean   :4.072           A      :14551         Mean   : 5.95  
##  3rd Qu.:5.000           D      :14274         3rd Qu.: 8.00  
##  Max.   :7.000           E      : 9795         Max.   :11.00  
##  NA's   :29084           (Other):12307         NA's   :29084  
##  ListingCategory..numeric. BorrowerState  
##  Min.   : 0.000            CA     :14717  
##  1st Qu.: 1.000            TX     : 6842  
##  Median : 1.000            NY     : 6729  
##  Mean   : 2.774            FL     : 6720  
##  3rd Qu.: 3.000            IL     : 5921  
##  Max.   :20.000                   : 5515  
##                            (Other):67493  
##                     Occupation         EmploymentStatus
##  Other                   :28617   Employed     :67322  
##  Professional            :13628   Full-time    :26355  
##  Computer Programmer     : 4478   Self-employed: 6134  
##  Executive               : 4311   Not available: 5347  
##  Teacher                 : 3759   Other        : 3806  
##  Administrative Assistant: 3688                : 2255  
##  (Other)                 :55456   (Other)      : 2718  
##  EmploymentStatusDuration IsBorrowerHomeowner CurrentlyInGroup
##  Min.   :  0.00           False:56459         False:101218    
##  1st Qu.: 26.00           True :57478         True : 12719    
##  Median : 67.00                                               
##  Mean   : 96.07                                               
##  3rd Qu.:137.00                                               
##  Max.   :755.00                                               
##  NA's   :7625                                                 
##                     GroupKey                 DateCreditPulled 
##                         :100596   2013-12-23 09:38:12:     6  
##  783C3371218786870A73D20:  1140   2013-11-21 09:09:41:     4  
##  3D4D3366260257624AB272D:   916   2013-12-06 05:43:16:     4  
##  6A3B336601725506917317E:   698   2014-01-14 20:17:49:     4  
##  FEF83377364176536637E50:   611   2014-02-09 12:14:41:     4  
##  C9643379247860156A00EC0:   342   2013-09-27 22:04:54:     3  
##  (Other)                :  9634   (Other)            :113912  
##  CreditScoreRangeLower CreditScoreRangeUpper
##  Min.   :  0.0         Min.   : 19.0        
##  1st Qu.:660.0         1st Qu.:679.0        
##  Median :680.0         Median :699.0        
##  Mean   :685.6         Mean   :704.6        
##  3rd Qu.:720.0         3rd Qu.:739.0        
##  Max.   :880.0         Max.   :899.0        
##  NA's   :591           NA's   :591          
##         FirstRecordedCreditLine CurrentCreditLines OpenCreditLines
##                     :   697     Min.   : 0.00      Min.   : 0.00  
##  1993-12-01 00:00:00:   185     1st Qu.: 7.00      1st Qu.: 6.00  
##  1994-11-01 00:00:00:   178     Median :10.00      Median : 9.00  
##  1995-11-01 00:00:00:   168     Mean   :10.32      Mean   : 9.26  
##  1990-04-01 00:00:00:   161     3rd Qu.:13.00      3rd Qu.:12.00  
##  1995-03-01 00:00:00:   159     Max.   :59.00      Max.   :54.00  
##  (Other)            :112389     NA's   :7604       NA's   :7604   
##  TotalCreditLinespast7years OpenRevolvingAccounts
##  Min.   :  2.00             Min.   : 0.00        
##  1st Qu.: 17.00             1st Qu.: 4.00        
##  Median : 25.00             Median : 6.00        
##  Mean   : 26.75             Mean   : 6.97        
##  3rd Qu.: 35.00             3rd Qu.: 9.00        
##  Max.   :136.00             Max.   :51.00        
##  NA's   :697                                     
##  OpenRevolvingMonthlyPayment InquiriesLast6Months TotalInquiries   
##  Min.   :    0.0             Min.   :  0.000      Min.   :  0.000  
##  1st Qu.:  114.0             1st Qu.:  0.000      1st Qu.:  2.000  
##  Median :  271.0             Median :  1.000      Median :  4.000  
##  Mean   :  398.3             Mean   :  1.435      Mean   :  5.584  
##  3rd Qu.:  525.0             3rd Qu.:  2.000      3rd Qu.:  7.000  
##  Max.   :14985.0             Max.   :105.000      Max.   :379.000  
##                              NA's   :697          NA's   :1159     
##  CurrentDelinquencies AmountDelinquent   DelinquenciesLast7Years
##  Min.   : 0.0000      Min.   :     0.0   Min.   : 0.000         
##  1st Qu.: 0.0000      1st Qu.:     0.0   1st Qu.: 0.000         
##  Median : 0.0000      Median :     0.0   Median : 0.000         
##  Mean   : 0.5921      Mean   :   984.5   Mean   : 4.155         
##  3rd Qu.: 0.0000      3rd Qu.:     0.0   3rd Qu.: 3.000         
##  Max.   :83.0000      Max.   :463881.0   Max.   :99.000         
##  NA's   :697          NA's   :7622       NA's   :990            
##  PublicRecordsLast10Years PublicRecordsLast12Months RevolvingCreditBalance
##  Min.   : 0.0000          Min.   : 0.000            Min.   :      0       
##  1st Qu.: 0.0000          1st Qu.: 0.000            1st Qu.:   3121       
##  Median : 0.0000          Median : 0.000            Median :   8549       
##  Mean   : 0.3126          Mean   : 0.015            Mean   :  17599       
##  3rd Qu.: 0.0000          3rd Qu.: 0.000            3rd Qu.:  19521       
##  Max.   :38.0000          Max.   :20.000            Max.   :1435667       
##  NA's   :697              NA's   :7604              NA's   :7604          
##  BankcardUtilization AvailableBankcardCredit  TotalTrades    
##  Min.   :0.000       Min.   :     0          Min.   :  0.00  
##  1st Qu.:0.310       1st Qu.:   880          1st Qu.: 15.00  
##  Median :0.600       Median :  4100          Median : 22.00  
##  Mean   :0.561       Mean   : 11210          Mean   : 23.23  
##  3rd Qu.:0.840       3rd Qu.: 13180          3rd Qu.: 30.00  
##  Max.   :5.950       Max.   :646285          Max.   :126.00  
##  NA's   :7604        NA's   :7544            NA's   :7544    
##  TradesNeverDelinquent..percentage. TradesOpenedLast6Months
##  Min.   :0.000                      Min.   : 0.000         
##  1st Qu.:0.820                      1st Qu.: 0.000         
##  Median :0.940                      Median : 0.000         
##  Mean   :0.886                      Mean   : 0.802         
##  3rd Qu.:1.000                      3rd Qu.: 1.000         
##  Max.   :1.000                      Max.   :20.000         
##  NA's   :7544                       NA's   :7544           
##  DebtToIncomeRatio         IncomeRange    IncomeVerifiable
##  Min.   : 0.000    $25,000-49,999:32192   False:  8669    
##  1st Qu.: 0.140    $50,000-74,999:31050   True :105268    
##  Median : 0.220    $100,000+     :17337                   
##  Mean   : 0.276    $75,000-99,999:16916                   
##  3rd Qu.: 0.320    Not displayed : 7741                   
##  Max.   :10.010    $1-24,999     : 7274                   
##  NA's   :8554      (Other)       : 1427                   
##  StatedMonthlyIncome                    LoanKey       TotalProsperLoans
##  Min.   :      0     CB1B37030986463208432A1:     6   Min.   :0.00     
##  1st Qu.:   3200     2DEE3698211017519D7333F:     4   1st Qu.:1.00     
##  Median :   4667     9F4B37043517554537C364C:     4   Median :1.00     
##  Mean   :   5608     D895370150591392337ED6D:     4   Mean   :1.42     
##  3rd Qu.:   6825     E6FB37073953690388BC56D:     4   3rd Qu.:2.00     
##  Max.   :1750003     0D8F37036734373301ED419:     3   Max.   :8.00     
##                      (Other)                :113912   NA's   :91852    
##  TotalProsperPaymentsBilled OnTimeProsperPayments
##  Min.   :  0.00             Min.   :  0.00       
##  1st Qu.:  9.00             1st Qu.:  9.00       
##  Median : 16.00             Median : 15.00       
##  Mean   : 22.93             Mean   : 22.27       
##  3rd Qu.: 33.00             3rd Qu.: 32.00       
##  Max.   :141.00             Max.   :141.00       
##  NA's   :91852              NA's   :91852        
##  ProsperPaymentsLessThanOneMonthLate ProsperPaymentsOneMonthPlusLate
##  Min.   : 0.00                       Min.   : 0.00                  
##  1st Qu.: 0.00                       1st Qu.: 0.00                  
##  Median : 0.00                       Median : 0.00                  
##  Mean   : 0.61                       Mean   : 0.05                  
##  3rd Qu.: 0.00                       3rd Qu.: 0.00                  
##  Max.   :42.00                       Max.   :21.00                  
##  NA's   :91852                       NA's   :91852                  
##  ProsperPrincipalBorrowed ProsperPrincipalOutstanding
##  Min.   :    0            Min.   :    0              
##  1st Qu.: 3500            1st Qu.:    0              
##  Median : 6000            Median : 1627              
##  Mean   : 8472            Mean   : 2930              
##  3rd Qu.:11000            3rd Qu.: 4127              
##  Max.   :72499            Max.   :23451              
##  NA's   :91852            NA's   :91852              
##  ScorexChangeAtTimeOfListing LoanCurrentDaysDelinquent
##  Min.   :-209.00             Min.   :   0.0           
##  1st Qu.: -35.00             1st Qu.:   0.0           
##  Median :  -3.00             Median :   0.0           
##  Mean   :  -3.22             Mean   : 152.8           
##  3rd Qu.:  25.00             3rd Qu.:   0.0           
##  Max.   : 286.00             Max.   :2704.0           
##  NA's   :95009                                        
##  LoanFirstDefaultedCycleNumber LoanMonthsSinceOrigination   LoanNumber    
##  Min.   : 0.00                 Min.   :  0.0              Min.   :     1  
##  1st Qu.: 9.00                 1st Qu.:  6.0              1st Qu.: 37332  
##  Median :14.00                 Median : 21.0              Median : 68599  
##  Mean   :16.27                 Mean   : 31.9              Mean   : 69444  
##  3rd Qu.:22.00                 3rd Qu.: 65.0              3rd Qu.:101901  
##  Max.   :44.00                 Max.   :100.0              Max.   :136486  
##  NA's   :96985                                                            
##  LoanOriginalAmount          LoanOriginationDate LoanOriginationQuarter
##  Min.   : 1000      2014-01-22 00:00:00:   491   Q4 2013:14450         
##  1st Qu.: 4000      2013-11-13 00:00:00:   490   Q1 2014:12172         
##  Median : 6500      2014-02-19 00:00:00:   439   Q3 2013: 9180         
##  Mean   : 8337      2013-10-16 00:00:00:   434   Q2 2013: 7099         
##  3rd Qu.:12000      2014-01-28 00:00:00:   339   Q3 2012: 5632         
##  Max.   :35000      2013-09-24 00:00:00:   316   Q2 2012: 5061         
##                     (Other)            :111428   (Other):60343         
##                    MemberKey      MonthlyLoanPayment LP_CustomerPayments
##  63CA34120866140639431C9:     9   Min.   :   0.0     Min.   :   -2.35   
##  16083364744933457E57FB9:     8   1st Qu.: 131.6     1st Qu.: 1005.76   
##  3A2F3380477699707C81385:     8   Median : 217.7     Median : 2583.83   
##  4D9C3403302047712AD0CDD:     8   Mean   : 272.5     Mean   : 4183.08   
##  739C338135235294782AE75:     8   3rd Qu.: 371.6     3rd Qu.: 5548.40   
##  7E1733653050264822FAA3D:     8   Max.   :2251.5     Max.   :40702.39   
##  (Other)                :113888                                         
##  LP_CustomerPrincipalPayments LP_InterestandFees LP_ServiceFees   
##  Min.   :    0.0              Min.   :   -2.35   Min.   :-664.87  
##  1st Qu.:  500.9              1st Qu.:  274.87   1st Qu.: -73.18  
##  Median : 1587.5              Median :  700.84   Median : -34.44  
##  Mean   : 3105.5              Mean   : 1077.54   Mean   : -54.73  
##  3rd Qu.: 4000.0              3rd Qu.: 1458.54   3rd Qu.: -13.92  
##  Max.   :35000.0              Max.   :15617.03   Max.   :  32.06  
##                                                                   
##  LP_CollectionFees  LP_GrossPrincipalLoss LP_NetPrincipalLoss
##  Min.   :-9274.75   Min.   :  -94.2       Min.   : -954.5    
##  1st Qu.:    0.00   1st Qu.:    0.0       1st Qu.:    0.0    
##  Median :    0.00   Median :    0.0       Median :    0.0    
##  Mean   :  -14.24   Mean   :  700.4       Mean   :  681.4    
##  3rd Qu.:    0.00   3rd Qu.:    0.0       3rd Qu.:    0.0    
##  Max.   :    0.00   Max.   :25000.0       Max.   :25000.0    
##                                                              
##  LP_NonPrincipalRecoverypayments PercentFunded    Recommendations   
##  Min.   :    0.00                Min.   :0.7000   Min.   : 0.00000  
##  1st Qu.:    0.00                1st Qu.:1.0000   1st Qu.: 0.00000  
##  Median :    0.00                Median :1.0000   Median : 0.00000  
##  Mean   :   25.14                Mean   :0.9986   Mean   : 0.04803  
##  3rd Qu.:    0.00                3rd Qu.:1.0000   3rd Qu.: 0.00000  
##  Max.   :21117.90                Max.   :1.0125   Max.   :39.00000  
##                                                                     
##  InvestmentFromFriendsCount InvestmentFromFriendsAmount   Investors      
##  Min.   : 0.00000           Min.   :    0.00            Min.   :   1.00  
##  1st Qu.: 0.00000           1st Qu.:    0.00            1st Qu.:   2.00  
##  Median : 0.00000           Median :    0.00            Median :  44.00  
##  Mean   : 0.02346           Mean   :   16.55            Mean   :  80.48  
##  3rd Qu.: 0.00000           3rd Qu.:    0.00            3rd Qu.: 115.00  
##  Max.   :33.00000           Max.   :25000.00            Max.   :1189.00  
## 

The prosper loan data can allow me to explore:

  1. The difference in borrower APR, borrower rate and lender yield. This shows the difference about what the actual interest rate the borrower haven.
  2. By examining the relationship between borrower income and loan payment, we can see how likely the borrower can repay a debt.
  3. Other information: general borrower credit status, credit risk analysis.

Univariate Plots Section

Basic Loan Information

By looking at the frequent plot, the majority of borrower APR rate (nearly 10,000) is around 0.17. The second large borrower APR rate is around 0.2. Most of the borrower APR rate is in the range of 0.12 to 0.35.

From the histogram, over 75000 people choose the loan term in 36 years, much higher than 60 years. However, we can see not many people choose the year of term in 12 years.

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   12.00   36.00   36.00   40.83   36.00   60.00

We can see that people normally lend the loan around 4,000, also 10,000 and 15,000 are also the popular loan amount for borrowers. This frequency plot is positively skewed but appeared small peaks in every 5,000.

Basic Borrower Information

Now let’s dive into basic borrower info!

Over 60,000 borrowers are employed, full-time followed behind as the second. Retired borrowers have the the least numbers.

Prosper has seven loan grades called Prosper Ratings: AA, A, B, C, D, E and HR where AA is the lowest risk down to HR which actually stands for high risk.

Except for empty values, the distribution of the ordinal variable has a bell-like shape. ‘C’ is the most frequent rating in our data and the highest (AA) and the lowest (HR) rating are less common comparing with other ratings in between.

Missing values are surprisingly high. After done the research, I found the Prosper Rating only exists after July 2009. Cross-validated from the loan original date, our data approved my assumption.

We can see from this plot that California has the most borrowers, then it’s Texas, New York and Florida.

From income perspective, we can see most borrower’s income range is $25,000 - 49,999, the second top is the income range $50,000 - 74,999. This can explain that most people who has an income range $25,000 - 49,999 is the new graduate or young professional, they have a very strong buying power to purchase their first home or car.

Further, we look at the anuual income of the borrower, this positive skewed frequency plot shows the result as same as the frequency income range plot and repeatively confirms the strong buying power coming from young professionals.

Credit History

Large proportion of borrowers only occurs one delinquency record. It normally because they forget to pay the annual fee for the credit card, or because they forget to repay one month loan.

By checking the delinquency records last 7 years and public delinquency records last 10 years, it appears that having one delinquency records still the highest. But it’s strange to see few borrowers have more than 30 times delinquencies.

Then how often the borrower use their bank card? We can see the majority bank card utilization frequency is over 0.5 and reach the peak at the 0.9. This can fully demonstrate how much borrower adore the bank card.

From the records of never delinquent trade, we can tell from this negative skewed histogram that most borrowers keep themselves having a good credit rating and value the lend money. Only few of them has a percentage below 0.50 that we can base on this record to select our loan borrowers and decide whether we want to deal with loan repayment with them or not.

We also want to know how much debt burden that our borrower carry. The distribution of this frequency plot shows that the majority of borrower has a debt and income ratio less than 0.40. The ratio reaches the peak at 0.20. The borrowers’s income can fully cover their debts, and it also obey the rule of “ratio of debt and income should 0.85 or 0.8”.

Univariate Analysis

What is the structure of your dataset?

The structure of the dataset covers the different loan interest rates, borrower’s employment status and income, liability ratio, delinquency records, etc.

What is/are the main feature(s) of interest in your dataset?

  1. Interest Rate: BorrowerAPR, BorrowerRate, LendYield
  2. Credit Rating: ProsperRating..Alpha., ProsperScore
  3. Loan Status: LoanOriginalAmount, BorrowerState, LoanOriginationQuarter
  4. Borrower Profile: IncomeRange, DebtToIncomeRatio, EmploymentStatus, MonthlyLoanPayment
  5. Delinquency Record: CurrentDelinquencies, DelinquenciesLast7Years, PublicRecordsLast12Months

What other features in the dataset do you think will help support your
investigation into your feature(s) of interest?

  1. LP_CustomerPayments, LP_CustomerPrincipalPayments, LP_InterestandFees, LP_ServicesFees Supporting to find how much profit the bank or loan providers can gain
  2. IsBorrowerHomeOwner, Occupation Supporting to find borrower’s income source

Did you create any new variables from existing variables in the dataset?

Created the annual income by calculating the stated monthly income

Of the features you investigated, were there any unusual distributions?
Did you perform any operations on the data to tidy, adjust, or change the form
of the data? If so, why did you do this?

I cleaned the income range data, because the not displayed income or null value will affect to see the percentage of each income range. I also excluded several missing values in few variables, such as: DebtToIncomeRatio, BankCardUtilization.

Bivariate Plots Section

From univariate analysis, I had a big picture about my dataset. In this section, I try to analyse the relationships between variables. My main focus will still on: - Customer Quality - Loan Provider Profitability - Credit Risk

Customer Quality

Let us take a closer look at our borrowers.

Loan amount slightly increased from the end of 2006 to 2007. However, as the subprime mortgage crisis spreaded to the whole nation and worldwide exploded, the loan amount jumped dramatically from 2008 to 2009. Nevertheless, since 2010, the loan amount has an exponential growth. More and more people need cash, and the financial market is very active.

From above plot we can see that people with less income would have heavier mortgage burden. The average debt to income ratio of the $1-24,999 group is slightly higher than $25,000 - 49,999 group. But it’s worth to note that the outliers of the $1- 24,999 group are much more than the others, and it’s strange to see some borrowers have debt to income ratio in 10.0.

Compared mmonthly loan payment and debt to income ration in terms of monthly income, monthly loan payment is heavily concentrated on monthly income range $0 to $ 10000. However, debt to income ratio shows a positive skewed distribution.

This plot can show us in each state which employment status has what prosper rating. The ND and IA have missing value. Self-employed borrower in MA and OR has the worst propser rating (HR) and highest risk. On the contrary, self-employed borrower in AL has the best prosper rating (AA).

From the line plot, as the total trades grow, the average bank card utilization varies from 0.5 to 0.75 then back to 0.6. If the total trades less than 50 times, then the bank card utilization is very efficient, which means the borrower is more likely to use the bank card to do transaction.

We also care about which state has most borrower. The highest number of borrower is in CA, and it demonstrates that the California consumers have some of the higest levels of debt in the country. Moreover, the following Texas, New York and Florida can also be seen as a big loan market.

Income over $100,000 has higher chance to ge the loan over $12,000. The highest loan amount can be over $35,000. It’s interesting to see not employed borrower can get higher loan amount than the borrower income range is between $1-24,999.

Bank Profitability

The interest and fees follow the same pattern as the loan offering amount in the begining but different in the end. Seen from this digram, we can see the interest and fees slightly increased from 2006 Q1 to 2007 Q2, then it slightly decreased utill 2008 Q4. However, unfortunately it dropped to the bottom in 2009 Q2 which has no any interest and fees at all. But it bounced back afterwards, utill 2011 Q2 it achieved the peak. However this thrive did not last long, it kept falling down after 2012 Q2.

In terms of prosper rating HR, the estimated loss and estimated return are in inverse proportion, because high risk credit has high change to loss the money and be hard to pay back.

Credit Risk

The loan provider adopts different strategies towards borrower who has different prosper rating. In general, the borrower APR, borrower rate and lender yield have similar pattern in terms of prosper rating. Obviously, the highest risk borrowers, the loan provider will charge highest borrower interest rate.

The current delinquencies in terms of borrower APR have a large distribution from 20 to 32. The current delinquencies in terms of borrower rate have a large distribution from 15 to 28, even reach to another peak in 35. The current delinquencies in terms of lender yield have a large distribution from 20 to 28, even reach to another peak in 35. However in general, the rate to cause the delienquencies are below 0.2.

There is a large proposion that the curent delinquency below 10 has a debt to income ratio from 0 to 0.2. But it’s strange to see the 100 times rdelinquencies have a low debt to income ratio.

Bivariate Analysis

Talk about some of the relationships you observed in this part of the
investigation. How did the feature(s) of interest vary with other features in
the dataset?

Borrowers who have lower income would burden heavier debt, according to the debt to income ratio and income range plot.

Did you observe any interesting relationships between the other features
(not the main feature(s) of interest)?

It’s interesting to see the loan original amount growed significantly from 2012 to 2014. But on the contrary, the interest and fees went down dramatically from 2012 to 2014. Even they followed the same pattern as loan original amount before 2012.

What was the strongest relationship you found?

The prosper rating and borrower rates have the strongest relationship.

Multivariate Plots Section

In this Section, I will analyse the client quality, loan provider profitability and credit risk together. In each plot, we can have an overall profile regarding to the three aspects.

As the income increases, the HR prosper rating has the higest borrower APR, in contrary the AA prosper rating has the lowest borrower APR. However, it’s interesting to see borrower rate doesn’t always go up while income increase. For example, borrower APR of the E prosper rating goes down as the income grow. But if the borrower is not employed, the borrower APR is always higher than any other income range.

## # A tibble: 50 x 4
##    LoanOriginalAmount ProsperRating..Alpha. mean_BorrowerAPR     n
##                 <int>                <fctr>            <dbl> <int>
##  1               1000                                     NA  2445
##  2               1000                     A       0.13733354    82
##  3               1000                    AA       0.07455752   105
##  4               1000                     B       0.18256000    16
##  5               1000                     C       0.22367717   173
##  6               1000                     D       0.27935232   138
##  7               1000                     E       0.34687778    99
##  8               1000                    HR       0.37400014   148
##  9               1001                             0.20628375     8
## 10               1005                             0.34020000     2
## # ... with 40 more rows

The loan amount provided to different prosper rating customer with different borrower rates is similar. Most of loan amount concentrates on $1000 - $25,000.

## # A tibble: 50 x 4
##    LoanOriginalAmount BorrowerState mean_LenderYield     n
##                 <int>        <fctr>            <dbl> <int>
##  1               1000                      0.1584107   521
##  2               1000            AK        0.1755500     6
##  3               1000            AL        0.2531619    21
##  4               1000            AR        0.1879650    20
##  5               1000            AZ        0.1815436    55
##  6               1000            CA        0.1874984   251
##  7               1000            CO        0.1821296    54
##  8               1000            CT        0.1660714    21
##  9               1000            DC        0.1987000     3
## 10               1000            DE        0.1467000     8
## # ... with 40 more rows

The super high-interest loans have boomed in CA. We also can spot on few other states have very high interest rates and loan amount, such as FL, OR, NY, DC, MA.

## # A tibble: 50 x 4
##    IncomeRange DebtToIncomeRatio mean_CurrentDelinquencies     n
##         <fctr>             <dbl>                     <dbl> <int>
##  1   $1-24,999              0.02                 0.8571429     7
##  2   $1-24,999              0.03                 2.2000000    20
##  3   $1-24,999              0.04                 1.1351351    37
##  4   $1-24,999              0.05                 1.5937500    64
##  5   $1-24,999              0.06                 1.7666667    60
##  6   $1-24,999              0.07                 1.0416667    72
##  7   $1-24,999              0.08                 1.8390805    87
##  8   $1-24,999              0.09                 1.0952381    84
##  9   $1-24,999              0.10                 1.1372549   102
## 10   $1-24,999              0.11                 1.0265487   113
## # ... with 40 more rows

The average current delinquencies are very high amoung the $1-24,999 and $50,000-74,999 income group. The reason behind is these groups have more borrowers and vert high loan amount. But the $100,000+ income group is easier to occur delinquency, because this group uses the loan money to deal with block trade.

## # A tibble: 50 x 4
##    ProsperRating..Alpha. EstimatedLoss mean_CurrentDelinquencies     n
##                   <fctr>         <dbl>                     <dbl> <int>
##  1                     A        0.0200                0.23536036   888
##  2                     A        0.0210                0.06650446  1233
##  3                     A        0.0224                0.08321580   709
##  4                     A        0.0249                0.16837482  1366
##  5                     A        0.0260                0.19602978   403
##  6                     A        0.0274                0.11344137  1049
##  7                     A        0.0299                0.13573620  1304
##  8                     A        0.0324                0.16180150  1199
##  9                     A        0.0325                0.09433962   106
## 10                     A        0.0330                0.13391557   687
## # ... with 40 more rows

The average current delinquency is very high if the prosper rating is Grade HR, then it’s Grade E. Borrower graded AA has the least chance to delinquent.

Multivariate Analysis

Talk about some of the relationships you observed in this part of the investigation. Were there features that strengthened each other in terms of
looking at your feature(s) of interest?

I am glad to find the ralationships between interest rate and borrower who has different income group and prosper rating. It’s meaningful because it gives loan provider some ideas about how to set price of the interest rate in terms of different customer quality.

Were there any interesting or surprising interactions between features?

I found it is surprising to see borrower who is income is above $100,000 appears to be linked to high levels of delinquent. Before I think they have better capital strength that would not easily to have any delinquency.


Final Plots and Summary

Plot One

Description One

Customer Quality

This plot shows the customer quality from two aspects: loan original amount and income range. Loan provider is interested in how much loan amount they can offer to customer and bacially this customer is in which income range.

From this plot, we can see $1-24,999 income group is more likely to lend loan around $4300; $25,000-49,999 income group is more likely to lend loan around $6100; $50,000-74,999 income group is more likely to lend loan around $8800; $75,000-99,999 income group is more likely to lend loan around $11400; $100,000+ income group is more likely to lend loan around $13000.

Plot Two

Description Two

Loan Provider Profitability

This time I want to know how much loan provider can earn from lending loans. So I checked average interest and fees from 2006 Q1 to 2014 Q1. What I found is: interest and fees slightly increased from the end of 2006 to 2007. It shows it has been affected by the subprime mortgage crisis from 2008 to 2009 and had a significantly drop.

But what is exciting is that from 2010 to 2011, the interest and fees kept growing, but had a huge jump from 2012 to 2014. The decrease in loan interest rates in 2012 was affected by the bank start to provide more loan to borrowers. Loan providers lost a lot of their financial market in US.

Based on this diagram, we can find the loan provider profitability heavily affected by the financial market. This is also important for loan providers to measure their profitability.

Plot Three

## # A tibble: 50 x 4
##    ProsperRating..Alpha. EstimatedLoss mean_CurrentDelinquencies     n
##                   <fctr>         <dbl>                     <dbl> <int>
##  1                     A        0.0200                0.23536036   888
##  2                     A        0.0210                0.06650446  1233
##  3                     A        0.0224                0.08321580   709
##  4                     A        0.0249                0.16837482  1366
##  5                     A        0.0260                0.19602978   403
##  6                     A        0.0274                0.11344137  1049
##  7                     A        0.0299                0.13573620  1304
##  8                     A        0.0324                0.16180150  1199
##  9                     A        0.0325                0.09433962   106
## 10                     A        0.0330                0.13391557   687
## # ... with 40 more rows

Description Three

Credit Risk

The estimated loss of AA prosper rating has the lowest delinquencies. The distribution of prosper rating B, C, E and HR is scattered distribution. The prosper rating HR has the highest estimated loss above 0.3 and the highest average current delinquencies over 24 times. This plot can show the credit risk and profit loss regarding different prosper rating of borrower. This distribution is rational distributed. ——

Reflection

In this report, I am glad to find the relationships between prosper rating and interest rate, delinquency and income range, delinquency and debt to income ratio, income range and loan amount.

During this EDA project, I struggled with choice of plot type, variables, and aesthetic parameters (e.g. bin width, color, axis breaks),and I tried so hard to make each plot appropriately display. I also considered how to avoid overploting and how to make sure the axis label not be cut off. Specially, in bivariate and multivariate analysis section, I need to keep follow my logic and pick the right variable to show ‘Customer Quality’, ‘Loan Provider Profitability’, and ‘Credit Risk’ outcomes.

What I successfully achieve, it’s to understand which type of plot goes with univariate plot and which goes with bivariate plot. I successfully showed the borrower basic profile, displayed the interest rate, loan amount across state and employment status. Further I successfully demonstrated the relationship between prosper rating adn current delinquency records.

In the future, I am keen to explore how to set a certain interest rate on different quality customers.